Search Results: "Lars Wirzenius"

7 January 2017

Lars Wirzenius: Hacker Noir, chapter 1: Negotiation

I participated in Nanowrimo in November, but I failed to actually finish the required 50,000 words during the month. Oh well. I plan on finishing the book eventually, anyway. Furthermore, as an open source exhibitionist I thought I'd publish a chapter each month. This will put a bit of pressure on me to keep writing, and hopefully I'll get some nice feedback too. The working title is "Hacker Noir". I've put the first chapter up on http://noir.liw.fi/.

14 December 2016

Antoine Beaupr : Debian considering automated upgrades

The Debian project is looking at possibly making automatic minor upgrades to installed packages the default for newly installed systems. While Debian has a reliable and stable package update system that has been an inspiration for multiple operating systems (the venerable APT), upgrades are, usually, a manual process on Debian for most users. The proposal was brought up during the Debian Cloud sprint in November by longtime Debian Developer Steve McIntyre. The rationale was to make sure that users installing Debian in the cloud have a "secure" experience by default, by installing and configuring the unattended-upgrades package within the images. The unattended-upgrades package contains a Python program that automatically performs any pending upgrade and is designed to run unattended. It is roughly the equivalent of doing apt-get update; apt-get upgrade in a cron job, but has special code to handle error conditions, warn about reboots, and selectively upgrade packages. The package was originally written for Ubuntu by Michael Vogt, a longtime Debian developer and Canonical employee. Since there was a concern that Debian cloud images would be different from normal Debian installs, McIntyre suggested installing unattended-upgrades by default on all Debian installs, so that people have a consistent experience inside and outside of the cloud. The discussion that followed was interesting as it brought up key issues one would have when deploying automated upgrade tools, outlining both the benefits and downsides to such systems.

Problems with automated upgrades An issue raised in the following discussion is that automated upgrades may create unscheduled downtime for critical services. For example, certain sites may not be willing to tolerate a master MySQL server rebooting in conditions not controlled by the administrators. The consensus seems to be that experienced administrators will be able to solve this issue on their own, or are already doing so. For example, Noah Meyerhans, a Debian developer, argued that "any reasonably well managed production host is going to be driven by some kind of configuration management system" where competent administrators can override the defaults. Debian, for example, provides the policy-rc.d mechanism to disable service restarts on certain packages out of the box. unattended-upgrades also features a way to disable upgrades on specific packages that administrators would consider too sensitive to restart automatically and will want to schedule during maintenance windows. Reboots were another issue discussed: how and when to deploy kernel upgrades? Automating kernel upgrades may mean data loss if the reboot happens during a critical operation. On Debian systems, the kernel upgrade mechanisms already provide a /var/run/reboot-required flag file that tools can monitor to notify users of the required reboot. For example, some desktop environments will popup a warning prompting users to reboot when the file exists. Debian doesn't currently feature an equivalent warning for command-line operation: Vogt suggested that the warning could be shown along with the usual /etc/motd announcement. The ideal solution here, of course, is reboot-less kernel upgrades, which is also known as "live patching" the kernel. Unfortunately, this area is still in development in the kernel (as was previously discussed here). Canonical deployed the feature for the Ubuntu 16.04 LTS release, but Debian doesn't yet have such capability, since it requires extra infrastructure among other issues. Furthermore, system reboots are only one part of the problem. Currently, upgrading packages only replaces the code and restarts the primary service shipped with a given package. On library upgrades, however, dependent services may not necessarily notice and will keep running with older, possibly vulnerable, libraries. While libc6, in Debian, has special code to restart dependent services, other libraries like libssl do not notify dependent services that they need to restart to benefit from potentially critical security fixes. One solution to this is the needrestart package which inspects all running processes and restarts services as necessary. It also covers interpreted code, specifically Ruby, Python, and Perl. In my experience, however, it can take up to a minute to inspect all processes, which degrades the interactivity of the usually satisfying apt-get install process. Nevertheless, it seems like needrestart is a key component of a properly deployed automated upgrade system.

Benefits of automated upgrades One thing that was less discussed is the actual benefit of automating upgrades. It is merely described as "secure by default" by McIntyre in the proposal, but no one actually expanded on this much. For me, however, it is now obvious that any out-of-date system will be systematically attacked by automated probes and may be taken over to the detriment of the whole internet community, as we are seeing with Internet of Things devices. As Debian Developer Lars Wirzenius said:
The ecosystem-wide security benefits of having Debian systems keep up to date with security updates by default overweigh any inconvenience of having to tweak system configuration on hosts where the automatic updates are problematic.
One could compare automated upgrades with backups: if they are not automated, they do not exist and you will run into trouble without them. (Wirzenius, coincidentally, also works on the Obnam backup software.) Another benefit that may be less obvious is the acceleration of the feedback loop between developers and users: developers like to know quickly when an update creates a regression. Automation does create the risk of a bad update affecting more users, but this issue is already present, to a lesser extent, with manual updates. And the same solution applies: have a staging area for security upgrades, the same way updates to Debian stable are first proposed before shipping a point release. This doesn't have to be limited to stable security updates either: more adventurous users could follow rolling distributions like Debian testing or unstable with unattended upgrades as well, with all the risks and benefits that implies.

Possible non-issues That there was not a backlash against the proposal surprised me: I expected the privacy-sensitive Debian community to react negatively to another "phone home" system as it did with the Django proposal. This, however, is different than a phone home system: it merely leaks package lists and one has to leak that information to get the updated packages. Furthermore, privacy-sensitive administrators can use APT over Tor to fetch packages. In addition, the diversity of the mirror infrastructure makes it difficult for a single entity to profile users. Automated upgrades do imply a culture change, however: administrators approve changes only a posteriori as opposed to deliberately deciding to upgrade parts they chose. I remember a time when I had to maintain proprietary operating systems and was reluctant to enable automated upgrades: such changes could mean degraded functionality or additional spyware. However, this is the free-software world and upgrades generally come with bug fixes and new features, not additional restrictions.

Automating major upgrades? While automating minor upgrades is one part of the solution to the problem of security maintenance, the other is how to deal with major upgrades. Once a release becomes unsupported, security issues may come up and affect older software. While Debian LTS extends releases lifetimes significantly, it merely delays the inevitable major upgrades. In the grand scheme of things, the lifetimes of Linux systems (Debian: 3-5 years, Ubuntu: 1-5 years) versus other operating systems (Solaris: 10-15 years, Windows: 10+ years) is fairly short, which makes major upgrades especially critical. While major upgrades are not currently automated in Debian, they are usually pretty simple: edit sources.list then:
    # apt-get update && apt-get dist-upgrade
But the actual upgrade process is really much more complex. If you run into problems with the above commands, you will quickly learn that you should have followed the release notes, a whopping 20,000-word, ten-section document that outlines all the gory details of the release. This is a real issue for large deployments and for users unfamiliar with the command line. The solutions most administrators seem to use right now is to roll their own automated upgrade process. For example, the Debian.org system administrators have their own process for the "jessie" (8.0) upgrade. I have also written a specification of how major upgrades could be automated that attempts to take into account the wide variety of corner cases that occur during major upgrades, but it is currently at the design stage. Therefore, this problem space is generally unaddressed in Debian: Ubuntu does have a do-release-upgrade command but it is Ubuntu-specific and would need significant changes in order to work in Debian.

Future work Ubuntu currently defaults to "no automation" but, on install, invites users to enable unattended-upgrades or Landscape, a proprietary system-management service from Canonical. According to Vogt, the company supports both projects equally as they differ in scope: unattended-upgrades just upgrades packages while Landscape aims at maintaining thousands of machines and handles user management, release upgrades, statistics, and aggregation. It appears that Debian will enable unattended-upgrades on the images built for the cloud by default. For regular installs, the consensus that has emerged points at the Debian installer prompting users to ask if they want to disable the feature as well. One reason why this was not enabled before is that unattended-upgrades had serious bugs in the past that made it less attractive. For example, it would simply fail to follow security updates, a major bug that was fortunately promptly fixed by the maintainer. In any case, it is important to distribute security and major upgrades on Debian machines in a timely manner. In my long experience in professionally administering Unix server farms, I have found the upgrade work to be a critical but time-consuming part of my work. During that time, I successfully deployed an automated upgrade system all the way back to Debian woody, using the simpler cron-apt. This approach is, unfortunately, a little brittle and non-standard; it doesn't address the need of automating major upgrades, for which I had to revert to tools like cluster-ssh or more specialized configuration management tools like Puppet. I therefore encourage any effort towards improving that process for the whole community. More information about the configuration of unattended-upgrades can be found in the Ubuntu documentation or the Debian wiki.
Note: this article first appeared in the Linux Weekly News.

Antoine Beaupr : Django debates privacy concern

In recent years, privacy issues have become a growing concern among free-software projects and users. As more and more software tasks become web-based, surveillance and tracking of users is also on the rise. While some software may use advertising as a source of revenue, which has the side effect of monitoring users, the Django community recently got into an interesting debate surrounding a proposal to add user tracking actually developer tracking to the popular Python web framework.

Tracking for funding A novel aspect of this debate is that the initiative comes from concerns of the Django Software Foundation (DSF) about funding. The proposal suggests that "relying on the free labor of volunteers is ineffective, unfair, and risky" and states that "the future of Django depends on our ability to fund its development". In fact, the DSF recently hired an engineer to help oversee Django's development, which has been quite successful in helping the project make timely releases with fewer bugs. Various fundraising efforts have resulted in major new Django features, but it is difficult to attract sponsors without some hard data on the usage of Django. The proposed feature tries to count the number of "unique developers" and gather some metrics of their environments by using Google Analytics (GA) in Django. The actual proposal (DEP 8) is done as a pull request, which is part of Django Enhancement Proposal (DEP) process that is similar in spirit to the Python Enhancement Proposal (PEP) process. DEP 8 was brought forward by a longtime Django developer, Jacob Kaplan-Moss. The rationale is that "if we had clear data on the extent of Django's usage, it would be much easier to approach organizations for funding". The proposal is essentially about adding code in Django to send a certain set of metrics when "developer" commands are run. The system would be "opt-out", enabled by default unless turned off, although the developer would be warned the first time the phone-home system is used. The proposal notes that an opt-in system "severely undercounts" and is therefore not considered "substantially better than a community survey" that the DSF is already doing.

Information gathered The pieces of information reported are specifically designed to run only in a developer's environment and not in production. The metrics identified are, at the time of writing:
  • an event category (the developer commands: startproject, startapp, runserver)
  • the HTTP User-Agent string identifying the Django, Python, and OS versions
  • a user-specific unique identifier (a UUID generated on first run)
The proposal mentions the use of the GA aip flag which, according to GA documentation, makes "the IP address of the sender 'anonymized'". It is not quite clear how that is done at Google and, given that it is a proprietary platform, there is no way to verify that claim. The proposal says it means that "we can't see, and Google Analytics doesn't store, your actual IP". But that is not actually what Google does: GA stores IP addresses, the documentation just says they are anonymized, without explaining how. GA is presented as a trade-off, since "Google's track record indicates that they don't value privacy nearly as high" as the DSF does. The alternative, deploying its own analytics software, was presented as making sustainability problems worse. According to the proposal, Google "can't track Django users. [...] The only thing Google could do would be to lie about anonymizing IP addresses, and attempt to match users based on their IPs". The truth is that we don't actually know what Google means when it "anonymizes" data: Jannis Leidel, a Django team member, commented that "Google has previously been subjected to secret US court orders and was required to collaborate in mass surveillance conducted by US intelligence services" that limit even Google's capacity of ensuring its users' anonymity. Leidel also argued that the legal framework of the US may not apply elsewhere in the world: "for example the strict German (and by extension EU) privacy laws would exclude the automatic opt-in as a lawful option". Furthermore, the proposal claims that "if we discovered Google was lying about this, we'd obviously stop using them immediately", but it is unclear exactly how this could be implemented if the software was already deployed. There are also concerns that an implementation could block normal operation, especially in countries (like China) where Google itself may be blocked. Finally, some expressed concerns that the information could constitute a security problem, since it would unduly expose the version number of Django that is running.

In other projects Django is certainly not the first project to consider implementing analytics to get more information about its users. The proposal is largely inspired by a similar system implemented by the OS X Homebrew package manager, which has its own opt-out analytics. Other projects embed GA code directly in their web pages. This is apparently the option chosen by the Oscar Django-based ecommerce solution, but that was seen by the DSF as less useful since it would count Django administrators and wasn't seen as useful as counting developers. Wagtail, a Django-based content-management system, was incorrectly identified as using GA directly, as well. It actually uses referrer information to identify installed domains through the version updates checks, with opt-out. Wagtail didn't use GA because the project wanted only minimal data and it was worried about users' reactions. NPM, the JavaScript package manager, also considered similar tracking extensions. Laurie Voss, the co-founder of NPM, said it decided to completely avoid phoning home, because "users would absolutely hate it". But NPM users are constantly downloading packages to rebuild applications from scratch, so it has more complete usage metrics, which are aggregated and available via a public API. NPM users seem to find this is a "reasonable utility/privacy trade". Some NPM packages do phone home and have seen "very mixed" feedback from users, Voss said. Eric Holscher, co-founder of Read the Docs, said the project is considering using Sentry for centralized reporting, which is a different idea, but interesting considering Sentry is fully open source. So even though it is a commercial service (as opposed to the closed-source Google Analytics), it may be possible to verify any anonymity claims.

Debian's response Since Django is shipped with Debian, one concern was the reaction of the distribution to the change. Indeed, "major distros' positions would be very important for public reception" to the feature, another developer stated. One of the current maintainers of Django in Debian, Rapha l Hertzog, explicitly stated from the start that such a system would "likely be disabled by default in Debian". There were two short discussions on Debian mailing lists where the overall consensus seemed to be that any opt-out tracking code was undesirable in Debian, especially if it was aimed at Google servers. I have done some research to see what, exactly, was acceptable as a phone-home system in the Debian community. My research has revealed ten distinct bug reports against packages that would unexpectedly connect to the network, most of which were not directly about collecting statistics but more often about checking for new versions. In most cases I found, the feature was disabled. In the case of version checks, it seems right for Debian to disable the feature, because the package cannot upgrade itself: that task is delegated to the package manager. One of those issues was the infamous "OK Google" voice activation binary blog controversy that was previously reported here and has since then been fixed (although other issues remain in Chromium). I have also found out that there is no clearly defined policy in Debian regarding tracking software. What I have found, however, is that there seems to be a strong consensus in Debian that any tracking is unacceptable. This is, for example, an extract of a policy that was drafted (but never formally adopted) by Ian Jackson, a longtime Debian developer:
Software in Debian should not communicate over the network except: in order to, and as necessary to, perform their function[...]; or for other purposes with explicit permission from the user.
In other words, opt-in only, period. Jackson explained that "when we originally wrote the core of the policy documents, the DFSG [Debian Free Software Guidelines], the SC [Social Contract], and so on, no-one would have considered this behaviour acceptable", which explains why no explicit formal policy has been adopted yet in the Debian project. One of the concerns with opt-out systems (or even prompts that default to opt-in) was well explained back then by Debian developer Bas Wijnen:
It very much resembles having to click through a license for every package you install. One of the nice things about Debian is that the user doesn't need to worry about such things: Debian makes sure things are fine.
One could argue that Debian has its own tracking systems. For example, by default, Debian will "phone home" through the APT update system (though it only reports the packages requested). However, this is currently not automated by default, although there are plans to do so soon. Furthermore, Debian members do not consider APT as tracking, because it needs to connect to the network to accomplish its primary function. Since there are multiple distributed mirrors (which the user gets to choose when installing), the risk of surveillance and tracking is also greatly reduced. A better parallel could be drawn with Debian's popcon system, which actually tracks Debian installations, including package lists. But as Barry Warsaw pointed out in that discussion, "popcon is 'opt-in' and [...] the overwhelming majority in Debian is in favour of it in contrast to 'opt-out'". It should be noted that popcon, while opt-in, defaults to "yes" if users click through the install process. [Update: As pointed out in the comments, popcon actually defaults to "no" in Debian.] There are around 200,000 submissions at this time, which are tracked with machine-specific unique identifiers that are submitted daily. Ubuntu, which also uses the popcon software, gets around 2.8 million daily submissions, while Canonical estimates there are 40 million desktop users of Ubuntu. This would mean there is about an order of magnitude more installations than what is reported by popcon. Policy aside, Warsaw explained that "Debian has a reputation for taking privacy issues very serious and likes to keep it".

Next steps There are obviously disagreements within the Django project about how to handle this problem. It looks like the phone-home system may end up being implemented as a proxy system "which would allow us to strip IP addresses instead of relying on Google to anonymize them, or to anonymize them ourselves", another Django developer, Aymeric Augustin, said. Augustin also stated that the feature wouldn't "land before Django drops support for Python 2", which is currently estimated to be around 2020. It is unclear, then, how the proposal would resolve the funding issues, considering how long it would take to deploy the change and then collect the information so that it can be used to spur the funding efforts. It also seems the system may explicitly prompt the user, with an opt-out default, instead of just splashing a warning or privacy agreement without a prompt. As Shai Berger, another Django contributor, stated, "you do not get [those] kind of numbers in community surveys". Berger also made the argument that "we trust the community to give back without being forced to do so"; furthermore:
I don't believe the increase we might get in the number of reports by making it harder to opt-out, can be worth the ill-will generated for people who might feel the reporting was "sneaked" upon them, or even those who feel they were nagged into participation rather than choosing to participate.
Other options may also include gathering metrics in pip or PyPI, which was proposed by Donald Stufft. Leidel also proposed that the system could ask to opt-in only after a few times the commands are called. It is encouraging to see that a community can discuss such issues without heating up too much and shows great maturity for the Django project. Every free-software project may be confronted with funding and sustainability issues. Django seems to be trying to address this in a transparent way. The project is willing to engage with the whole spectrum of the community, from the top leaders to downstream distributors, including individual developers. This practice should serve as a model, if not of how to do funding or tracking, at least of how to discuss those issues productively. Everyone seems to agree the point is not to surveil users, but improve the software. As Lars Wirzenius, a Debian developer, commented: "it's a very sad situation if free software projects have to compromise on privacy to get funded". Hopefully, Django will be able to improve its funding without compromising its principles.
Note: this article first appeared in the Linux Weekly News.

25 November 2016

Iain R. Learmonth: vmdebootstrap Sprint Report

This is now a little overdue, but here it is. On the 10th and 11th of November, the second vmdebootstrap sprint took place. Lars Wirzenius (liw), Ana Custura (ana_c) and myself were present. liw focussed on the core of vmdebootstrap, where he sketched out what the future of vmdebootstrap may look like. He documented this in a mailing list post and also presented (video). Ana and myself worked on live-wrapper, which uses vmdebootstrap internally for the squashfs generation. I worked on improving logging, using a better method for getting paths within the image, enabling generation of Packages and Release files for the image archive and also made the images installable (live-wrapper 0.5 onwards will include an installer by default). Ana worked on the inclusion of HDT and memtest86+ in the live images and enabled both ISOLINUX (for BIOS boot) and GRUB (for EFI boot) to boot the text-mode and graphical installers. live-wrapper 0.5 was released on the 16th November with these fixes included. You can find live-wrapper documentation at https://live-wrapper.readthedocs.io/en/latest/. (The documentation still needs some work, some options may be incorrectly described). Thanks to the sponsors that made this work possible. You re awesome. (:

22 November 2016

Lars Wirzenius: Debian miniconf in Cambridge

I spent a few days in Cambridge for a minidebconf. This is a tiny version of the full annual Debconf. We had a couple of days for hacking, and another two days for talks. I spent my hacking time on thinking about vmdebootstrap (my tool for generating disk images with an installed Debian), and came to the conclusion I need to atone my sins for writing such crappy code by rewriting it from scratch to be nicer to use. I gave a talk about this, too. The mailing list post has the important parts, and meetings-archive has a video. I haven't started the rewrite, and it's not going to make it for stretch. I also gave two other talks, on the early days of Linux, and Qvarn, the latter being what I do at work. Thank you to ARM, for sponsoring the location, and the other sponsors for sponsoring food. These in-real-life meetings between developers are important for the productivity and social cohesion of Debian.

13 November 2016

Andrew Cater: Debian MiniConf, ARM, Cambridge 11/11/16 - Day 2 post 2

It's raining cats and dogs in Cambridge.

Just listening to Lars Wirzenius - who shared an office with Linus Torvalds, owned the computer that first ran Linux, founded the Linux Documentation Project. Living history in more than one sense :)

Live streaming is also happening.

Building work is also happening - so there may be random noise happening occasionally.

Andrew Cater: Debian MiniConf, ARM Cambridge, 13/11/12 - Day 4 post 2

Just watching Lars Wirzenius talking about Qvarn - identity and data protection management on large scale. Compliant with EC data/identity management regulations and concerns.

The room fell silent at 1100 for two minutes - as we did on Friday 11/11/12.
This is remembering the dead, wounded and those affected by the wars of the 20th and 21st centuries.

Inevitably, it also reminded me of friends and colleagues in Debian that are no longer with us: for Espy and so many others before and since, thanks from me - you are well remembered here.

29 October 2016

Lars Wirzenius: Obnam 1.20 released

I have just released version 1.20 of Obnam, my backup program. It's been nine months since the previous release, and that's a long time: I've had an exciting year, and not entirely in a good way. Unfortuntely that's eaten up a lot of my free time and enthusiasm for my hobby projects. See below for a snippet of NEWS, with a summary of the user-visible changes. A lot of the effort has gone into improving FORMAT GREEN ALBATROSS, but that isn't documented in the NEWS file. I've received patches and actionable bug reports from a number of people, and I'm grateful for those. I try to credit them by name in the NEWS file. Obnam NEWS This file summarizes changes between releases of Obnam. NOTE: Obnam has an EXPERIMENTAL repository format under development, called green-albatross-20160813. It is NOT meant for real use. It is likely to change in incompatible ways without warning. DO NOT USE it unless you're willing to lose your backup. Version 1.20, released 2016-10-29 Minor changes: Bug fixes:

9 September 2016

Lars Wirzenius: Thinking about CI, maybe writing ick2

A year ago I got tired of Jenkins and wrote a CI system for myself, Ick. It's served me well since, but it's a bit clunky and awkward and I have to hope nobody else wants to use it. I've been thinking about re-architecting Ick from scratch, and so I wrote down some of my thinking about this. It's very raw, but just in case someone else might be interested, I put it online at ick2. At this point I'm still thinking about very high level concepts. I've not written any code, and probably won't in the next couple of months. But I had to get this out of my brain.

22 August 2016

Lars Wirzenius: Linux 25 jubilee symposium

I gave a talk about the early days of Linux at the jubilee symposium arranged by the University of Helsinki CS department. Below is an outline of what I meant to speak about, but the actual talk didn't follow it exactly. You can compare these to the video once it comes online.

17 August 2016

Charles Plessy: Who finished DEP 5?

Many people worked on finishing DEP 5. I think that the blog of Lars does not show enough how collective the effort was. Looking in the specification's text, one finds:
The following alphabetical list is incomplete; please suggest missing people:
Russ Allbery, Ben Finney, Sam Hocevar, Steve Langasek, Charles Plessy, Noah
Slater, Jonas Smedegaard, Lars Wirzenius.
The Policy's changelog mentions:
  * Include the new (optional) copyright format that was drafted as
    DEP-5.  This is not yet a final version; that's expected to come in
    the 3.9.3.0 release.  Thanks to all the DEP-5 contributors and to
    Lars Wirzenius and Charles Plessy for the integration into the
    Policy package.  (Closes: #609160)
 -- Russ Allbery <rra@debian.org>  Wed, 06 Apr 2011 22:48:55 -0700
and
debian-policy (3.9.3.0) unstable; urgency=low
  [ Russ Allbery ]
  * Update the copyright format document to the version of DEP-5 from the
    DEP web site and apply additional changes from subsequent discussion
    in debian-devel and debian-project.  Revise for clarity, to add more
    examples, and to update the GFDL license versions.  Thanks, Steve
    Langasek, Charles Plessy, Justin B Rye, and Jonathan Nieder.
    (Closes: #658209, #648387)
On my side, I am very grateful to Bill Alombert for having committed the document in the Git repository, which ended the debates.

16 August 2016

Lars Wirzenius: 20 years ago I became a Debian developer

Today it is 23 years ago since Ian Murdock published his intention to develop a new Linux distribution, Debian. It also about 20 years since I became a Debian developer and made my first package upload. In the time since: It's been a good twenty years. And the fun ain't over yet.

19 July 2016

Lars Wirzenius: Debugging over email

I write free software and I have some users. My primary support channels are over email and IRC, which means I do not have direct access to the system where my software runs. When one of my users has a problem, we go through one or more cycles of them reporting what they see and me asking them for more information, or asking them to try this thing or that thing and report results. This can be quite frustrating. I want, nay, need to improve this. I've been thinking about this for a while, and talking with friends about it, and here's my current ideas. First idea: have a script that gathers as much information as possible, which the user can run. For example, log files, full configuration, full environment, etc. The user would then mail the output to me. The information will need to be anonymised suitably so that no actual secrets are leaked. This would be similar to Debian's package specific reportbug scripts. Second idea: make it less likely that the user needs help solving their issue, with better error messages. This would require error messages to have sufficient explanation that a user can solve their problem. That doesn't necessarily mean a lot of text, but also code that analyses the situation when the error happens to include things that are relevant for the problem resolving process, and giving error messages that are as specific as possible. Example: don't just fail saying "write error", but make the code find out why writing caused an error. Third idea: in addition to better error messages, might provide diagnostics tools as well. A friend suggested having a script that sets up a known good set of operations and verifies they work. This would establish a known-working baseline, or smoke test, so that we can rule things like "software isn't completely installed". Do you have ideas? Mail me (liw@liw.fi) or tell me on identi.ca (@liw) or Twitter (@larswirzenius).

15 July 2016

Lars Wirzenius: Two-factor auth for local logins in Debian using U2F keys

Warning: This blog post includes instructions for a procedure that can lead you to lock yourself out of your computer. Even if everything goes well, you'll be hunted by dragons. Keep backups, have a rescue system on a USB stick, and wear flameproof clothing. Also, have fun, and tell your loved ones you love them. I've recently gotten two U2F keys. U2F is a open standard for authentication using hardware tokens. It's probably mostly meant for website logins, but I wanted to have it for local logins on my laptop running Debian. (I also offer a line of stylish aluminium foil hats.) Having two-factor authentication (2FA) for local logins improves security if you need to log in (or unlock a screen lock) in a public or potentially hostile place, such as a cafe, a train, or a meeting room at a client. If they have video cameras, they can film you typing your password, and get the password that way. If you set up 2FA using a hardware token, your enemies will also need to lure you into a cave, where a dragon will use a precision flame to incinerate you in a way that leaves the U2F key intact, after which your enemies steal the key, log into your laptop and leak your cat GIF collection. Looking up information for how to set this up, I found a blog post by Sean Brewer, for Ubuntu 14.04. That got me started. Here's what I understand: Here are the detailed steps for Debian stretch, with minute differences from those for Ubuntu 14.04. If you follow these, and lock yourself out of your system, it wasn't my fault, you can't blame me, and look, squirrels! Also not my fault if you don't wear sufficient protection against dragons.
  1. Install pamu2fcfg and libpam-u2f.
  2. As your normal user, mkdir ~/.config/Yubico. The list of allowed U2F keys will be put there.
  3. Insert your U2F key and run pamu2fcfg -u$USER > ~/.config/Yubico/u2f_keys, and press the button on your U2F key when the key is blinking.
  4. Edit /etc/pam.d/common-auth and append the line auth required pam_u2f.so cue.
  5. Reboot (or at least log out and back in again).
  6. Log in, type in your password, and when prompted and the U2F key is blinking, press its button to complete the login.
pamu2fcfg reads the hardware token and writes out its identifying data in a form that the PAM module understands; see the pam-u2f documentation for details. The data can be stored in the user's home directory (my preference) or in /etc/u2f_mappings. Once this is set up, anything that uses PAM for local authentication (console login, GUI login, sudo, desktop screen lock) will need to use the U2F key as well. ssh logins won't. Next, add a second key to your u2f_keys. This is important, because if you lose your first key, or it's damaged, you'll otherwise have no way to log in.
  1. Insert your second U2F key and run pamu2fcfg -n > second, and press the second key's button when prompted.
  2. Edit ~/.config/Yubico/u2f_keys and append the output of second to the line with your username.
  3. Verify that you can log in using your second key as well as the first key. Note that you should have only one of the keys plugged in at the same time when logging in: the PAM module wants the first key it finds so you can't test both keys plugged in at once.
This is not too difficult, but rather fiddly, and it'd be nice if someone wrote at least a way to manage the list of U2F keys in a nicer way.

19 June 2016

Lars Wirzenius: New APT signing key for code.liw.fi/debian

For those who use my code.liw.fi/debian APT repository, please be advised that I've today replaced the signing key for the repository. The new key has the following fingerprint:
8072 BAD4 F68F 6BE8 5F01  9843 F060 2201 12B6 1C1F
I've signed the key with my primary key and sent the new key with signature to the key servers. You can also download it at http://code.liw.fi/apt.asc.

10 May 2016

Lars Wirzenius: Qvarn Platform announcement

In March we started a new company, to develop and support the software whose development I led at my previous job. The software is Qvarn, and it's fully free software, licensed under AGPL3+. The company is QvarnLabs (no website yet). Our plan is to earn a living from this, and our hope is to provide software that is actually useful for helping various organisations handle data securely. The first press release about Qvarn was sent out today. We're still setting up the company and getting operational, but a little publicity never hurts. (Even if it is more marketing-speak and self-promotion than I would normally put on my blog.) So this is what I do for a living now.

The development of the open source Qvarn Platform was led by QvarnLabs CEO Kaius H ggblom (left) and CTO Lars Wirzenius. Helsinki, Finland 10.05.2016 With Privacy by Design, integrated Gluu access management and comprehensive support for regulatory data compliance, Qvarn is set to become the Europe-wide platform of choice for managing workforce identities and providing associated value-added services. Construction industry federations in Sweden, Finland and the Baltic States have been using the Qvarn Platform (http://www.qvarn.org) since October 2015 to securely manage the professional digital identities of close to one million construction workers. Developed on behalf of these same federations, Qvarn is now free and open source software; making it a compelling solution for any organization that needs to manage a secure register of workers data. "There is something universal and fundamental at the core of the Qvarn platform. And that s trust," said Qvarn evangelist Kaius H ggblom. "We decided to make it free, open source and include Gluu access management because we wanted all those using Qvarn or contributing to its continued development to have the freedom to work with the platform in whatever way is best for them." Qvarn has been designed to meet the requirements of the European Union s new General Data Protection Regulation (GDPR), enabling organizations that use the platform to ensure their compliance with the new law. Qvarn has also incorporated the principles of Privacy by Design to minimize the disclosure of non-essential personal information and to give people more control over their data. "Today, Qvarn is used by the construction industry as a way to manage the data of employees, many of whom frequently move across borders. In this way the platform helps to combat the grey economy in the building sector, thereby improving quality and safety, while simultaneously protecting the professional identity data of almost a million individuals," said H ggblom. "Qvarn is so flexible and secure that we envision it becoming the preferred platform for the provision of any value-added services with an identity management component, eventually even supporting monetary transactions." Qvarn is a cloud based solution supported to run on both Amazon Web Services (AWS) and OpenStack. In partnership with Gluu, the platform delivers an out-of-the-box solution that uses open and standard protocols to provide powerful yet flexible identity and access management, including mechanisms for appropriate authentication and authorization. "Qvarn's identity management and governance capabilities perfectly compliment the Gluu Server's access management features," said Founder and CEO of Gluu, Michael Schwartz. "Free open source software (FOSS) is essential to the future of identity and access management. And the FOSS development methodology provides the transparency that is needed to foster the strong sense of community upon which a vibrant ecosystem thrives." Qvarn s development team continues to be led by recognized open source developer and platform architect Lars Wirzenius. He has been developing free and open source software for 30 years and is a renowned expert in the Linux environment, with a particular focus on the Debian distribution. Lars works at all levels of software development from writing code to designing system architecture. About the Qvarn Platform: The Qvarn Platform is free and open source software for managing workforce identities. Qvarn is integrated with the Gluu Server s access management features out of the box, using open and standard protocols to provide the platform with a single common digital identity and mechanisms for appropriate authentication and authorization. A cloud based solution, Qvarn is supported to run on both Amazon Web Services (AWS) and OpenStack. Privacy by Design is central to the architecture of Qvarn and the platform has been third party audited to a security level of HIGH. http://www.qvarn.org For more information, please contact:
Andrew Flowers
andrew.flowers@ellisnichol.com
+358 40 161 5668

31 March 2016

Lars Wirzenius: New job: QvarnLabs

Today was my last day at Suomen Tilaajavastuu, where I worked on Qvarn. Tomorrow is my first day at my new job. The new job is for a new company, tentatively named QvarnLabs (registration is in process), to further develop and support Qvarn. The new company starts operation tomorrow, so you'll have to excuse me that there isn't a website yet. Qvarn provides a secure, RESTful JSON HTTP API for storing and retrieving data, with detailed access control (and I can provide more buzzwords if necessary). If you operate in the EU, and store information about people, you might want to read up about the General Data Protection Regulation, and Qvarn may be a possible part of a solution you want to look into, once we have the website up.

28 March 2016

Lars Wirzenius: Obnam user survey, 2016

In January and February of 2016 I ran an Obnam user survey. I'm not a statistician, but here is my analysis of the results. Executive summary: Obnam is slow, buggy, and the name is bad. But they'd like to buy stickers and t-shirts. Method I wrote up a long list of questions about things I felt were of interest to me. I used Google Forms to collect responses, and exported them as a CSV file, and analysed based on that. I used Google Forms, even though it is not free software, as it was the easiest service I got to work that also seemed it'd be nice for people to use. I could have run the survey using Ikiwiki, but it wouldn't have been nearly as nice. I could have found and hosted some free software for this, but that would have been much more work. Most questions had free form text responses, and this was both good and bad. It was good, because many of the responses included things I could never have expected. It was bad, because it took me a lot more time and effort to process those. I think next time I'll keep the number of free text responses down. For some of the questions, I hand-processed the responses to a more or less systematic form, in order to count things with a bit of code. For others, I did not, and show the full list of responses (I'm lazy, we don't need a survey to determine that). The responses See http://code.liw.fi/obnam/survey-2016.html for the responses, after hand-processing. For the questions for which it makes sense, a script has tabulated the various responses and calculated percentages. I haven't produced graphs, as I don't know how to do that easily. (Maybe next time I'll enlist the help of statisticians.) Conclusions

12 March 2016

Lars Wirzenius: Not-platform for Debian project leader elections 2016

After some serious thinking, I've decided not to nominate myself in the Debian project leader elections for 2016. While I was doing that, I wrote the beginnings of a platform, below. I'm publishing it to have a record of what I was thinking, in case I change my mind in the future, and perhaps it can inspire other other people to do something I would like to happen. Why not run? I don't think I want to deal with the stress. I already have more than enough stress in my life, from work. I enjoy my obscurity in Debian. It allows me to go away for long periods of time, and to ignore any discussions, topics, and people that annoy or frustrate me, if I don't happen to want to tackle them at any one time. I couldn't do that if I was DPL. NOT a platform for Debian project leader election, 2016 Apart from what the Debian constitution formally specifies, I find that the important duties of the Debian project leader are: I do not feel it is the job of the DPL to set goals for the project, technical or otherwise, any more than any other member of the project. Such goals tend to best come from enthusiastic individual developer who want something and are willing to work on it. The DPL should enable such developers, and make sure they have what they need to do the work. My plan, if elected
  1. Keep Debian running. Debian can run for a long time effectively on autopilot, even if the DPL vanishes, but not indefinitely. At minimum, the DPL should delegate the secretary and technical committee members, and decide on how money should be spent. I will make sure this minimum level is achieved.
  2. While I have no technical goals to set for the project, I have an organisational one. I believe it is time for the project to form a social committee whose mandate is to step in and help resolve conflicts in their early stages, before they grow big enough that the DPL, the tech-ctte, listmasters, or the DAM needs to involved. See below for more details on this. If I am elected, I will do my best to get a social committee started, and I will assume that any vote for me is also a vote for a social committee.
Social committee (Note: It's been suggested that this is a silly name, but I haven't had time to come up with anything better. I already rejected "nanny patrol".) We are a big project now. Despite our reputation, we are a remarkably calm project, but there are still occasional conflicts, and some of them spill out into our big mailing lists. We are not very good, as a project, in handling such situations. It is not a new idea, but I think its time has come, and I propose that we form a new committee, a social committee, whose job is to help de-escalate conflict situations while they are still small conflicts, to avoid them growing into big problems, and to help resolve big conflicts if they still happen. This is something the DPL has always been doing. People write to the DPL to ask for mediation, or other help, when they can't resolve a situation by themselves. We also have the technical committee, listmasters, GRs, and the expulsion process defined by the DAM. These are mostly heavy-weight tools and by the time it's time to consider their use, it's already too late to find a good solution. Having the DPL do this alone puts too much pressure on one person. We've learnt that important tasks should generally be handled by teams rather than just one person. Thus, I would like us to have a social committee that: About me I've been a Debian developer since 1996. I've been retired twice, while I spent large amounts of time on other things. I haven't been a member of any important team in Debian, but I've been around long enough that I know many people, and have a reasonable understanding of how the projects works.

3 February 2016

Jonathan Dowland: Comparing Docker images

I haven't written much yet about what I've been up to at work. Right now, I'm making changes to the sources of a set of Docker images. The changes I'm making should not result in any changes to the actual images: it's just a re-organisation of the way in which they are built. I've been using the btrfs storage driver for Docker which makes comparing image filesystems very easy from the host machine, as all the image filesystems are subvolumes. I use a bash script like the following to make sure I haven't broken anything:
oldid="$1"; newid="$2";
id_in_canonical_form()  
    echo "$1"   grep -qE '^[a-f0-9] 64 $'
 
canonicalize_id()  
    docker inspect --format '  .Id  ' "$1"
 
id_in_canonical_form "$oldid"   oldid="$(canonicalize_id "$oldid")"
id_in_canonical_form "$newid"   newid="$(canonicalize_id "$newid")"
cd "/var/lib/docker/btrfs/subvolumes"
sumpath()  
    cd "$1" && find . -printf "%M %4U %4G %16s %h/%f\n"   sort
 
diff -ruN "$oldid" "newid"
diff -u <(sumpath "$oldid") <(sumpath "$newid")
Using -printf means I can ignore changes in the timestamps on files which is something I am not interested in. If it is available in your environment, Lars Wirzenius' tool Summain generates manifests that include a file checksum and could be very useful for this use-case.

Next.

Previous.